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ABSTRACT | 

To determine the concurrent validity of a 
standardized test and its usefulness to educators, reading ratings 
from the Iowa Tests of Basic Skills^ were compared to teacher ratings 
and independently administered placement test ratings. Two hundred 
one primary children, the majority of whom read one or more years 
below grade level, took the ITBS by a grade testing plan. Correlation 
analysis, analysis of variance, and examination of difference scores 
supported the conclusion of low concurrent validity for the 
standardized test with low-achieving readers" instructional reading 
levels^ Out--of-grade testing was recommended for low-achieving' 
primary children, (Author) 
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Teachers and Tests » A Concurrent Validity Study 



Ibe usefulness of a standardized test depends on several factors 
such as the measure's validity, reliability, economy, and ease of In- 
terpretation (Stanley & Hopkins, 1972). To determine the concurrent 
validity of a standardized test and, hence, a necessary condition for 
its utility, researchers compared reading grade equivalent scores from 
a standardized test with teacher ratings and with placement test ratings. 
Interest focused on determining such validity for low-achieving primary 
grade readers in an inner city school when the cnlldren took the stand- 
ardized test according to a graded testln«5 plan. A graded testing plan 
calls for all children in a grade to take the same level of test regard- 
less of the children's varying achievement levels. 

After four years of classro9m experiences and observations as a 
primary grade teacher, the principal author questioned the validity of 
standardized test scores with children's actual reading levels 'for these 
low-achieving readers tested under a graded testing policy. During 
classroom testing situations, observations showed that a child tended to 
guess answers or mark responses randomly when the standardized test was 
on or above his/her frustration level, or that level one or more years 
above the child's instructional reading level. The child who C0UI4 not 
read the test e-qierienced frustration, while the child who could read the 
test tended to take it as directed. Children who felt a need guess 
answers or mark random responses were not being measured (Hiea^bnymus & 
Lindqulst, 1971). It appeared their responses were given not out of a 

knowledge base, but out of frustration. Consequently, their scores could 

\ 

not be interpreted as valid measures. 
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Normally, school systems use standardized test restalts In helpiJig 
the teacher with tasks such as detertiinlng group and Individual diag- 
noses and prescriptions, forming intraclass groupings, and assessing 
student growth. They use these results in helping administrators per- 
foim tasks such as assessii::!^ instructional programs, making decisions con- 
cerning planning and grouping, and comparing school units (Hieronymus & 
Llndqulst, 197k). Iherefore, determining whether or not a standardized 
test posse5=^ses concurrent validity with actual reading levels for the 
type of children identified is of the greatest import, since invalid 
scores are useless to teachers and administrators alike. , ; 

Researchers undertook the determination of the agreement or dis- 
agreement of standardized test reading ratings with two criterion rating 
sources: teacher judgments and reading placement test results. Previous 
studies had compared standardized test reading results with informal read- 
ing inventory results (Johns, 19725 McCracken, 1962; Sipay, 196^) f so 
researchers felt the third rating source— teacher judgments— was neces- 
sary to verify the -accuracy of results from a reading placement test ad- 
-ministered Independently by a researcher. 

Looking at the three studies cited, it was difficult to dI^aw defini- 
tive conclusions regarding standardized test validity for results were mixed. 
Also the students in these three studies were not classified as low-achieving 
readers. McCracken (1962) indicated that 6"^ of children in his study 
would have been grouped on frustration reading levels if the standard- 
ized test scores alone had been the basis for group mem nlsions. 
Sipay (1964) discovered that standardized test scores tendt o overesti- 
mate instructional reading levels, a position commonly held among educa- 
tors. At the same time they underestimated frustration reading levels. 
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Johns (1972) showed that IftS of the children in his study were rated 
one grade ],evel or more above their Instructional reading levels by 
the standardized test. 
Instruments 

The eight teachers taking part In the study administered subtests* 
Vocabulary and Reading Comprehension of the Iowa Tests of Basic Skil ls 
(1971) for the Primary Battery p Level 8, Form 5 and tho Regular Battery, 
Level 9f Form 5 as part of the yearly testing program in a Southern 
urban school system. A researcher administered the Macmlllan Reader 
Placement Test (1972) individually to children. This test consisted of 
two parts: vocabulary recognition and reading selection comprehension. 
Methods and Results 

Two questions were posed for study: Do teacher ratings and place- 
ment test ratings agree on children* s reading levels? Do standardized 
test ratings possess concurrent validity for the two criterion source 
ratings of children's reading levels? To answer these queQtlons, re- 
searchers proposed the statistical null hypotheses that there would be 
no significant differencee between reading ratings from (l) teacher 
judgments and placement test results, (2) teacher judgments and stand- 
ardized subtest vocabulary results, (3) teacher judgments and standard- 
ized subtest comprehension results, (^) placement test results and 
standardized subtest vocabulary results, and (5) placement test results 
and standardized subtest comprehension results. 

A sample of 201 second and third grade children in an urban elem- 
entary school was chosen for several reasons: the majority of children 
read one year or more below grade level according to school records; the 
children were administered a standardized test according to a graded test- 
ing plan; and the children, teachers, and test scores Were made available 



to researchers by the school system for the validation study. THe 
school chosen was a Title I school located in a loH income housing 
project. The sample included only 201 of 230 children enrolled be- 
cause of the loss of children who did not take all tests used in the 
comparisons. Of the children in the sainple, 7^ read on a level half a 
year or more belo« grade level, while 5S^ read on a level one year or 
more below grade level according to school records. 

Primary children were selected as the target group because of a 
need for an examination of standardized test concurrent validity for 
this age group. Focusing on reading ratings seemed to be highly appro- 
priate for children in the primary eirades where great emphasis normally 
is placed on reading instruction. Two major advantages also influenced 
the limiting of the study to an examination of reading ratings. Previous 
classroom experiences in administering reading placement tests was an 
advantage to researchers, and the fact that teachers already had judged 
their students' reading levels in the course of regular instruction was 

an advantage to them. * 

-nie W_Tefets of -Basic Skills (ITBS) were administered by the eight 
classroom teachers to all children, except the mentally and physically 
handicapped and those habitually absent, during May, 197^. In the. 
weeks immediately preceding and following the ITBS administration, a 
researcher administered the Ma£gjjJLaiLR!'-ac^er Placement Test. The place- 
ment test was based on the children's ba,Bal reader series and was ad- 
ministered individually. Placement test administration occurred inde- 
pendently of teacher ratings. 

After the placement tests were completed, each of the eight teachers 
was given a list of her students on which to indicate reading grade levels 

() 



Teachers were asked to make this decision for each child accordiiag to 

the child's iDasal reader instructional level. For example,' a child 

1 

reading in a second grade first semester reader woxild be rated 2 as 
his/her reading level. To insure comparability of scores, the grade 
level ratings fxcm all three sources were classified as seen in Table 1. 
Standardized test ratings are grade equivalent scores, This classifica- 
tion allowed each child, to vary within one-half school year, or within^ 
five school months. 

TABLE 1 

Classification of Grade Level Ratings 



Teacher 


Placement 


standardized 


Study 


Ratings 


Test Ratings 


Test Ratings 


Glasflfication 


R 


R 


0.0-0.9 


.5 


Pp/P- 


Pp/P 


1.0-1.4 


1.0 


1 


1 


1.5-1.9 


1.5 


2' 


2^ 


2.0-2.4 


2.0 


2^ 


2^ 


2.5-2.9 


2.5 


3' 


3' 


3.0-3.^ 


3.0 


3^ 


3^ 


• 3.5-3.9 


3.5 


if 




4.0-4.9 


4.0 


5 


5 


5.0-5.9 


5.0 


6 


6 


6.0-6.9 


6,0 



Note. — Fourth, fifth, and sixth grade readers cover one school 
year. Children placed in these readers may vary within this . inge. 



The sample was divided into eight homeroom units, intact groups 
extant at the school, for data analysis. Since teach ep and placciment 
test ratings included vocabulary- and comprelriension skills > an examina- 
tion of both vocabulary and comprehension ^subtests of the ITBS was needed. 
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As a measure of the concurrent validity between the standardized 
test ratings and the criterion source ratings, correlation coefficients 
were obtained for grade level ratings between every pair of the three 
rating source, combinations as shown In Table 2 (see Appendix A for 
correlations by classes) • 

TABLE Z 

Correlations Betwocin Hating Sources: Total Sample 

VocabuL Comprehension Teacher Placement Test 

Vocabulary 1.00 
Comprehension 

Teacher .51 

Placement Test .55 

n- = 201. 

Stanley and Hopkins (1972) explained concurrent validity as' the ex- 
tent of correlation between two concurrently obtained criteria. In this 
sense , the ITBS ratings possessed concurrent validity only to the extent 
which they correlated with teacher and placement test ratings. Correla- 
tions were strongest for comparisons of teacher ratings with placement 
test ratings and were weakest for comparisons of comprehension subtest 
ratings with both teacher and placement test ratings. 

Grade level scores determined by the three rating sources were also 
studied for each child. Several Interesting facts were revealed. Teacher 
and placement test ratings differed one year or more in only S% of the cases. 
ITBS vocabulary ratings ov':irestlmated teacher ratings one year or more in • 
\% of the cases anr^ underestimated teacher ratings in 20^ of the cases. 
ITBS comprehension ratings overestimated teacher ratings one year or 

8 



1.00 
.38 1.00 
.35 .91 I'OO 
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more in 15^ of the cases and underestimated teacher ratings one year or 
more in of the cases. ITBS vocabulary ratings overestimated place- 
ment test ratings one year or more in 12^ of the cases and underesti- 
mated placement test ratings one year or more In 205^ of the cases. 
ITBS comprehension ratings overestimated placement test ratings one year 
or more in 1^^ of the cases and underestimated placement test ratings 
one year or more in Z% of the ratings. 

Such an examination of difference scores between rating sources 
showed that stancJardized test scores neither overestimated nor under- 
estimated teacher and placement test ratings with any consistency. Stand- 
ardized test ratings differed from the two criterion source ratings one 
year or more in either direction in 32?S to of the cases, depending 
on the comparison. 

An analysis of variance using a randomized block design, with the , 
rating sources as treatments and j;he four classes of /each grade as blocks, 
wai= conducted for each grade at the .05 significance level. The analysis 
of var5ance in grade two resulted in rejection of the null hypothesis folr 
compsirisons between (i) teacher and ITBS vocabulary ratings, (2) teacher 
and ITBS comprehension ratings, arid (3) placement test and ITBS compre- 
hension ratings. In grade three the null hypothesis was not rejected 
for comparisons between every pair of raters. Results for these tests 
are given in Appendix B, 
Conclusions 

One of the most Interesting findings of the study was the fact that 
for the group of i children tested" standardized test ratings' did not con- 
sistently overestimate instructional reading levels, but instead the 
standardized test ratings both overestimated and underestimated 

9 . ^ ' 
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Instructional reading levels as determined by teacher judgments and 
placement test ratings. i 

Differences bet^ieen the standardized test ratings and the two cri- 
terion source ratings of one year or more in 32^ to kC^ of the cases 
raised questions as to the standardized test's usefulness for teachers 
and other professional educators. The fact that the differences were 
not consistently in one direction further clouded the issue of their 
usefulness, since a consistent difference in either direction could 
provide information that could be used. 

Correlation ainalysis also supported the notion of little concvirarent 
validity for the standardized test ratings Kith instructional reading 
levels for the low-achieving readers in the study, A test's concurrent 
validity may be measured by the extent to which it correlates with a 

concurrently obtained criterion. In the present study , teacher and 

I 

placement test ratingsj correlated to a substantisJ- degree for all eight 
classes and the total sample with correlations from #85 to •96« Ratings 
from teachers and the ITBS vocabulary subtest correlat'fed from .35 to .68 
for the classes with a correlation of .5-1 for the entire sample. 

Teacher and ITBS comprehension" ratings coixrelated from 
,13 to .6^ with a total sample correlation of .38. Ihe overall correla- 
tion probably was misleading since classes one through four, the second 
grade classes, had correlations of .58 to .6^ while classes five through 
eight, the third grade classes, had correlations of .13 to .37« Thia 
same trend was evident in the correlation between placement test and 
ITBS comprehension ratings. Correlations between ratings from these 
two sources for the classes ranged from .12 to .61 with a totail sample 
correlation of .35t 
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Specxxlatlon alout the difference between the second and third grade 
correlations for the'ITBS comprehension' ratings and the two criterion 
sotirce ratings led to a v loser examination of these ratings. It revealed 
that more second graders read on a second grade level i 5(^9 than 
third graders read on a third grade level, 29^. Lower correlations were 
obtained between ITBS comprehension ratings and both teacher and place- 
ment test ratings in the third grade where a smaller percentage of child- 
ren read on grade level . This phenomenon is consistent with the original 
classroom observation that children who could not read a standardized 
test obtained scores markedly varied from teachers' estimates of in- 
structional reading!, levels. 

Por correlations between placement test and ITBS vocabulary ratings 
for the eight classes the lower correlations again occurred in two third 
grade classes, although the trend noted above was r-t as clearly delineated 
for the ITBS vocabiaary subtest as for the ITBS Comprehension subt.est comp- 
arisons. For the toti.1 sample, placement test and ITBS vocabulary ratings 
correlated .55- 

Standardized te^t ratings correlated wltlj both teacher ratings and 

placement test ratings within a range of from; .12 to. 68. None of these 

i 

correlations, however, approached the strength of the correlations bet- 
ween 'teacher and placement test ratings, which ranged from .85 to ,96. 
The obtained correlations supported the hypotiheses of agreement between 
teacher and placement test ratings and disagreement between the two cri- 
terion source ratings and the standardized test ratings. 

Hypothesis testing was not a particularly fruitful technique in 
the study 0 However, the reseailh questions more appropriately concerned 
information about individuals rather than group means only. . 



ERIC 



Educational Importance 

Ihe administration of standardised tests constitutes a major por- 
tion of many school system testing programs • Ihey are d^sigaed to 
help teac;hers and administrators in their professional tasks. If 
obtained scores are not valid, however, the information they contain 
is useless to teachers ^and administratora for instructional planning 
and Implementation. At worst. Invalid scores can be used In^.a manner 
harmful to a child if the user of the scoras assumes them to b© valid. 
/Results of the pres- it study called into question the use of one stand- 
ardized test for young, low-achieving readers tested according to a 
graded testing policy adrainlstered by the school system. Particularly 
questionable ifere the results of the comprehension subtest for thiyd . 

■ • ! 

i . 

graders In the sample. , ' f 

It would be more appropriate to administer the ITBS in accor^hce 
with either of two alternative plans suggested by the test publishers 
(Hieronymus & Lindquist , 197^) . An out-of-grade plan calls for admin- 
istration of one test level to all children in a grade or subgrouping 
within a grade, with the test level being either lower or higher than 
actual grade placement, whichever is more suitable. An individualized 
plan calls for the administration of an appropriate test level for 
each child. Implementing either of the two plans precludes grade level 
comparisons with a norming group? however, admiiilstration of inapproprlat 
test levels seems to yield questionable results. It is suggested that 
out-of-grade norms be developed for local school systems to gain more 
useful information on an immediate level regarding pupj.1 achievement and 
Instructional programming. Because of time and cost factors, an out-of- 
grade plan probably is more feasible. 



il 

Until alternatives to the graded testing plan are found for low 
achievers who are classed by grade levels according to age^ standard- 
Ized test jicore interpretations should be done with a certain arriount 
of caution* It is important that professic- .^allce the 

polntlessness of giving standardized tes*^ cannot- read 

them. S ch testing experiences can only resuj^u ^.u xiustratlon for 
children a4d'^useless Information for teachers • 
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APPENDIX A 
Correlation Tables by Classes 



TABLE A 

Correlations Between Rating "^JdurcesJ Glass 1 



Vocabulary 
Comprehension 
Teacher 
Placement Test 



n = 25 



Vocabulary Comprehension Teacher Placement Test 



1.00 
.57 



1.00 
.58 

=56 



loOO 

.91 



1 ,00 



TABLE B 

Correlations Between Ratin'g Sources! Class 2 





Vocabulary 


Comprehension 


Teacher Placement Test 


Vocabulary 


1.00 






Comprehension 




.. oc 




Teacher 


•5 


,61 


1.00 


Placement Test 


50 


.57 


.85 


n ^ 23 




TABLE C 






Correl^itions Between Rating Sources! Class 3 

/ 

. ■ — ■ f — 




■ ,->>:a.-ij.ary 


Compreheision 


Teacher, Place:ient Test 


Vocabulary 


..00 




. / 


Comprehension 


.56 


1.00 




Teacher 


.68 




1.00 


s 

Placement Test 


.66 


.56 


.87 1.00 


n » 24 




1(5 
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, TABLE D 

Correlations Between Rating Sources! Class 4 



Vocabulary Comprehension Teacher Placement Test 



Vocabulary 
Comprehension 
Teacher 
Placement Test 



1.00 
.58 
.6? 

■".68 



1.00 

.63 

'^1 



1.00 

.95 



1.00 



n = 23 



TABLE E 

Correlations Between Rating Sources: Class 5 



Vocabul -/ 0 ^mprehensl.^ri Teacher Placement 



Vocabulary 
Comprehension 
Teacher 
Placement Test 

n 3 25 



1. 



1.00 

.33 



1.00 
.91 



1.00 



Correlati 



Vocabulary / 
Comprehension 
Teacher- 
Placement Test 



Vocabi 



1. 



TABLE P 

iween Rating Sources: Class 6 

Comprehension Teacher "Placement Teat 



.36 



1.00 
.37 
.31 



1.00 

.96 



loOO 



n = 26 



17 



16 



TABLE G 

Correlations Between Rating Sources i Glass 7 



Vocabulary 

Comprehension 

Teacher 

Placement Test 



n = 29 



Vocabulary Comprehension Teacher Placement Test 



1.00 
.2? 
.35' 
.35 



1.00 
.12 



1.00 
.91 



1.00 



TABLE H ' 

Correlations Between Rating Sources « ' Class 8 



Vocabulary Comprehension Teacher Placement Test 



Vocabulary 

Comprehension 

Teacher 

Placement Test 



1.00 
-.08 
.61 
.64 



1.00 

.13 : 
.19 



1.00 
.91 



1.00 



n = 26 
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APPENDIX B 
Analysis of Variance Tables by Grades 




19^ 
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TABLE A 
Analysis of Variance Table 
Teacher and Placement Testt Grade Two 



Source 


SS 


df 


MS 


P 


Between Raters 


0.0009 


1 


0.0009 


0.0018 


Between Glasses 


5.3300 


3 


1.7770 


3.4721** 


Residual 


9^.6791 


185 


0.511'^ 




L 


100.0100 


189 






**p^ .01 












TABLE B 
Analysis of Variance Table 
Teacher and VocabaUry Subtest i Grade Two 




Source 


SS 


df 


MS 


F 


letween Raters 


2.10 


1 


2.1000 


■4.0268* 


Between Glasses 




3 


1.5600 


2.9914** 


Residual 


96.^7 


165 


0.5215 




Total 


103.2^ 


159 






*p.c-:.05 












• TABLE G 
Analysis of Varlamce Table 
Teacher and Comprehension Subtest i Grade Two 


Source 


SS 


df 


MS 


F 


Between Raters 


6.16 


1 


60I6OO 


5O.O8I3** 


Between Class 


11.81 


3 


3.9400 


32.0325** 


ReslduaJ. 


22.76 


185 


0.1230 




To-=al 


J^O.73 


189 


/ 





•.01 



20 



19 



TABLE D 
Analysis of Variance Table 
ELacenient Test and Vocabulaiy Subtest* Grade Tmo 



Source 



Between Raters 
Between Classes 

i\ . -dual 



SS 



2.10 

2.28 



df 



1 

J 
185 



MS 



?,aooo 
j.76eo 

0.6211 



3. 

1,22% 



Total 



119.28 



189 



TABLE E 
Analysis of Variance l^ble 
Placement Test and Comprehension Subtest: Grade T-ko 



ource 



Between Raters 
Between Classes 
Rfc-ddual 



To^al 



SS 



6.16 
11.70 
38.77 



56.63 



df 



1 
3 
185 



189 



MS 



6.1600 
3.9000 
0.2096 



29.3890** 
18,6069** 



**p^ .01 



Source 



BetWxien Raters 
Between Glasses 
3esidual 



Total 



, TA^BLE P 
Analysis of Variance Table 
Teacher emd Placement Testi Grade Three 



SS 



0.1^ 
55.65 
109.69 
165,^8 



df 



1 
3 

207 
211 



MS 



O.lilOO 
18. 5500 
0.5299 



F 



0.2642 
35.0066** 



**p^ .01 



21 
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TABLE G 
Analysis of Variance Table 
Teacher and Vor- i^ii1?>ry Subtest: Grade 









MS 


F 


Betneen Raters 


0,02 


1 


0.020c 


0.0309 


Betneen Glasses 


* 

1.39 


3 


0.4600 


0,7101 


Residual 


13^.09 


207 


0.6478 




ioxaJL 


135.50 


211 








TABLE H 
Analysis of Variance Table 
Teacher and Comprehension Subtests Grade Ihre 


e 


Source 


SS 


df 


MS 


— 

F 


Between Raters 


0.48 


1 


0.4800 ; 


0.7157 


Between Glasses 


6.01 


3 


2.0000 


3.1070*, 


Residual 


133.25 


207 


0.6437 




Total 


139.7^ 


211 







*p^.05 

TABLE. I 
Analysis of Variance Table 
Placement Test and Vocabulary Subtest* Grade Three 



Source SS df "S I F" ^ 

Between Raters 0.27 1 0.2700 0.4189 

Between Classes 1.7^ 3 O.58OO 0.8998 

Residual 133.43 20? ' 0.6446 



Total 



135.44 211 



21 



TABLE J 
Analysis of Variance Table 
Placement Test and Comprehension Subtest i Grade Three 



Source SS df MS 



2 a 



Between Raters 0.08 1 0.0800 0.1251 

Between Glasses 6.50 3 . 2.1700 3-3927* 

Residual 132..39 20? 0.6396 

Total 138.97 211 

*P-^.05 



